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Abstract 

The  Read/Conditional- Write  (R/CW)  protocol  provides  linearizable  reads  and  conditional-writes  of 
individual  objects.  A  client’s  conditional-write  of  an  object  succeeds  only  if  the  object  has  not  been 
conditionally-written  since  it  was  last  read  by  the  client.  In  this  sense,  R/CW  semantics  are  similar 
to  those  of  a  compare- and- swap  register.  If  a  conditional- write  does  not  succeed,  it  aborts.  The 
R/CW  protocol  supports  multi-object  reads  and  conditional- writes;  such  operations  are  strictly 
serializable.  A  variant  of  the  R/CW  protocol,  the  Query/Update  (Q/U)  protocol,  provides  an 
operations-based  interface  to  clients:  clients  invoke  query  and  update  methods  on  objects  rather 
than  reading  and  writing  objects  in  their  entirety.  The  R/CW  and  Q/U  protocols  are  correct  in 
the  asynchronous  timing  model  and  tolerate  Byzantine  failures  of  clients  and  servers. 
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1  Introduction 


The  Read/ Conditional- Write  (R/CW)  protocol  provides  linearizable  reads  and  conditional- writes 
of  individual  objects.  A  client’s  conditional-write  of  an  object  succeeds  only  if  the  object  has  not 
been  conditionally-written  since  it  was  last  read  by  the  client.  In  this  sense,  R/CW  semantics  are 
similar  to  those  of  a  compare-and-swap  register.  The  R/CW  protocol  supports  multi-object  reads 
and  conditional- writes;  such  operations  are  strictly  serializable.  A  variant  of  the  R/CW  protocol, 
the  Query/Update  (Q/U)  protocol,  provides  an  operations-based  interface  to  clients:  clients  invoke 
query  and  update  methods  on  objects  rather  than  reading  and  writing  objects  in  their  entirety.  The 
R/CW  and  Q/U  protocols  are  correct  in  the  asynchronous  timing  model  and  tolerate  Byzantine 
faults  of  clients  and  servers. 

This  technical  report  is  a  companion  for  the  paper  “Fault-scalable  Byzantine  fault-tolerant 
services”  [1]  and  is  made  available  to  ensure  timely  dissemination  of  details  elided  from  the  paper. 
Motivation  for  these  protocols,  the  intuition  that  underlies  their  structure,  their  relation  to  related 
work,  and  empirical  results  are  found  in  the  main  paper.  In  this  technical  report,  we  present  a 
proof  of  correctness  and  detailed  pseudo-code  for  the  R/CW  protocol  for  individual  objects  (§2  and 
§3  respectively).  We  also  discuss  how  to  extend  the  R/CW  protocol  to  accommodate  multi-object 
conditional- writes  (§4).  Finally,  we  present  extended  pseudo-code  for  the  Q/U  protocol,  a  variant 
of  the  R/CW  protocol,  and  an  extended  example  execution  of  the  Q/U  protocol  (§5). 

2  Safety  of  the  Read/Conditional- Write  (R/CW)  protocol 

Using  terminology  developed  throughout  the  proof,  the  outline  of  the  proof  of  safety  for  the  R/CW 
protocol  for  individual  objects  is  as  follows.  First,  we  provide  terminology  and  system  definitions. 
Second,  we  define  classification  rules  for  candidates.  Third,  we  define  the  types  of  conditional- 
writes,  the  pre-conditions  for  each  type  of  conditional-write,  and  the  post-conditions  for  each  type 
of  conditional-write.  Fourth,  we  focus  on  the  properties  of  a  write  that  establishes  a  value  candidate 
and  a  copy  that  establishes  a  value  candidate;  these  two  types  of  conditional-writes  define  segments. 
Such  segments  have  start  and  end  logical  times  that  are  shown  to  never  overlap  and  that,  in  fact,  the 
start  of  each  segment  corresponds  to  the  end  of  another  segment  (except  for  the  segment  that  starts 
at  the  well-defined  initial  value).  As  such,  there  is  a  single  chain  of  values,  called  the  eonditioned- 
on  ehain,  from  the  latest  established  value  candidate  back  to  the  initial  value  candidate.  The 
conditioned-on  chain  is  defined  by  the  conditioned-on  timestamp  of  its  members. 

Given  this  view  of  segments  we  demonstrate  that  conditional- writes  that  complete  can  be  totally 
ordered  by  logical  timestamp  and  that  this  ordering  is  linearizable  [5] .  Moreover,  given  the  segments 
and  the  linearized  order  of  conditional- writes,  we  demonstrate  that  reads  that  return  a  value  can 
be  partially  ordered  by  the  logical  timestamp  of  the  value  they  return.  This  partial  order  of  reads 
can  be  arbitrarily  extended  to  a  total  order.  This  total  order  is  shown  to  be  linearizable,  thus 
demonstrating  that  the  set  of  all  read  and  conditional-writes  that  return  values  is  linearizable. 

The  R/CW  protocol  uses  five  types  of  conditional- writes.  A  write  writes  a  new  value  candidate 
conditioned-on  an  established  value  candidate  that  is  the  latest  candidate  in  the  object  history 
set.  An  inline_write  writes  a  value  candidate  to  additional  servers — this  is  done  if  the  latest 
timestamp  in  the  object  history  set  corresponds  to  a  repairable  value  candidate.  A  barrier  writes 
a  barrier  candidate — this  is  done  if  the  latest  timestamp  in  the  object  history  set  corresponds 
to  an  incomplete  candidate  (value  or  barrier).  An  inline  .barrier  writes  a  barrier  candidate  to 
additional  servers — this  is  done  if  the  latest  timestamp  in  the  object  history  set  corresponds  to  a 
repairable  barrier  candidate.  A  copy  copies  a  potential  or  established  value  candidate  forward  to 
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a  new  timestamp — this  is  done  if  the  latest  timestamp  in  the  object  history  set  corresponds  to  a 
complete  barrier  candidate;  the  candidate  written  has  the  same  data  value  as  the  latest  potential 
or  established  candidate  value  that  precedes  the  established  barrier  candidate. 

2.1  Terminology 

In  this  section  terminology  to  describe  the  R/CW  protocol  is  introduced.  Some  symbols  and 
structures  used  in  the  pseudo-code  are  also  used  in  the  proof.  Figure  1  on  page  15  and  Table  1  on 
page  15  may  be  helpful  to  the  reader. 

Definition  2.1  {client,  server,  channels,  shared  keys,  operation,  request).  The  R/CW  protocol 
operates  in  a  system  comprised  of  clients  and  servers.  Point-to-point  authenticated  channels  exist 
among  all  servers  and  between  all  clients  and  servers.  Channels  are  assumed  to  be  unreliable,  with 
the  same  properties  as  those  used  by  Aguilera  et  al.  in  the  crash-recovery  model  (i.e.,  channels  do 
not  create  messages,  channels  may  duplicate  messages  a  finite  number  of  times,  and  channels  may 
drop  messages  a  finite  number  of  times)  [2] .  Such  channels  can  be  made  reliable  by  repeated  resends 
of  requests.  An  infrastructure  for  deploying  shared  keys  among  pairs  of  servers  is  assumed  to  exist. 
Clients  issue  reads  and  conditional-writes  to  sets  of  servers.  These  are  comprised  of  requests  that 
a  client  sends  directly  to  each  server. 

Definition  2.2  {asynchronous  timing  model).  The  R/CW  protocol  operates  safely  in  an  asyn¬ 
chronous  system.  No  assumptions  are  made  about  the  duration  of  message  transmission  delays  or 
the  execution  rates  of  clients  and  servers  except  that  they  are  non-zero. 

Definition  2.3  {hybrid  failure  model,  Byzantine  failures,  crash-recovery  failures,  benign,  malev¬ 
olent,  good,  faulty).  Byzantine  faulty  components  may  exhibit  arbitrary,  potentially  malicious, 
behavior  [6].  Clients  may  be  Byzantine  faulty.  The  server  model  is  a  hybrid  failure  model  [9,  10] 
of  Byzantine  and  crash-recovery  failures.  We  use  the  crash-recovery  failure  model  of  Aguilera  et 
al.  [2].  Servers  have  persistent  storage  that  is  durable  through  a  crash  and  subsequent  recovery.  In 
the  hybrid  crash-recovery-Byzantine  fault  model,  every  server  is  either  always-up,  eventually-up, 
eventually- down,  unstable,  or  malevolent.  Since  the  Byzantine  failure  model  is  a  strict  generaliza¬ 
tion  of  the  crash-recovery  failure  model,  another  term  —  malevolent  —  is  used  to  categorize  those 
servers  that  in  fact  exhibit  out-of-specification,  non-crash  behavior.  A  server  is  good  if  it  is  either 
always-up  or  eventually-up  (i.e.,  it  may  crash,  but  there  is  a  time  after  which  it  is  always-up).  A 
server  is  faulty  if  it  is  unstable,  eventually-down,  or  malevolent.  As  such,  every  server  is  either 
good  or  faulty.  A  server  is  benign  if  it  obeys  its  specification  except  for  crashes  and  recoveries.  As 
such,  every  server  is  either  benign  or  malevolent. 

Definition  2.4  {Computationally  bounded  adversary).  Clients  and  servers  are  assumed  to  be  com¬ 
putationally  bounded  so  that  cryptographic  primitives  are  effective. 

Definition  2.5  {universe,  U,  n,  quorum,  Q,  quorum  system,  Q).  The  quorum  system  definition 
is  based  on  that  of  Malkhi  and  Reiter  [7].  We  assume  a  universe  U  of  servers  such  that  \U\  =  n. 
A  quorum  system  Q  C  2^  is  a  non-empty  set  of  subsets  of  U,  every  pair  of  which  intersect.  Each 
(5  G  Q  is  called  a  quorum.  The  notation  2*®*  denotes  the  power  set  of  set. 

Definition  2.6  {failure  prone  system,  fault  set,  T,  T,  malevolent  fault  set,  B,  B).  We  extend  the 
definition  of  a  failure  prone  system  of  Malkhi  and  Reiter  [7]  to  accommodate  the  hybrid  server 
failure  model.  We  assume  that  in  any  execution,  for  T  C  2^  any  T  G  T  contains  all  faulty  servers. 
We  assume  that  in  any  execution,  for  B  ^2^  any  B  ^  B  contains  all  malevolent  servers.  It  follows 
from  the  definitions  of  faulty  and  malevolent  that  VR  G  B,  3T  G  T  :  B  CT. 
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Definition  2.7  {candidate,  accept,  Data).  A  client  conditional-write  request  generates  an  candi¬ 
date  at  the  server.  Servers  accept  candidates  if  validation  passes.  The  structure  for  a  candidate, 
Candidate  is  given  in  Figure  1.  Every  candidate  contains  data  denoted  Data. 

Definition  2.8  {logical  timestamp  (timestamp),  LT ,  LT co).  Logical  timestamps  are  used  ex¬ 
tensively  in  the  R/CW  protocol.  Each  candidate  contains  two  logical  timestamps:  the  logical 
timestamp  of  the  candidate  {LT)  and  the  logical  timestamp  of  the  candidate  it  is  conditioned-on 
{LT co)-  In  the  remainder  of  this  paper,  we  refer  just  to  timestamps,  rather  than  logical  timestamps. 

Definition  2.9  {comparing  timestamps,  =,  <).  Timestamps  can  be  compared  with  the  =  and  < 
operators.  Equality  (=)  is  defined  naturally  (all  elements  of  the  timestamp  must  be  identical).  Less 
than  (<)  is  defined  with  the  elements  of  the  timestamp  being  compared  in  their  order  of  definition 
(i.e..  Time,  then  BarrierFlag,  then  ClientID,  then  DataVerifier,  and  finally  OHSVerifier).  To 
compare  the  BarrierFlag  element,  false  <  true.  To  compare  the  ClientID,  DataVerifier,  and 
OHSVerifier,  lexicographic  comparisons  are  performed  (e.g.,  memcmp  could  be  performed). 

Observation  2.10.  We  observe  that  although  LT  .DataVerifier  is  a  cryptographic  hash  of  the 
Data  in  a  candidate,  it  is  not  guaranteed  to  be  unique — many  candidates  in  a  replica  history 
may  have  the  same  Data.  We  also  observe  that  LT  .OHSVerifier  is  unique  for  all  candidates  in 
a  replica  history — as  will  be  seen,  the  object  history  set  (and  data)  sent  to  a  server  by  a  client 
uniquely  determine  the  candidate  it  accepts.  A  server  can  only  accept  a  candidate  once,  since  it  is 
impossible  for  an  object  history  set  to  be  current  (cf.  Definition  2.34)  at  server  s  if  s  has  already 
accepted  the  candidate  corresponding  to  ObjectHistorySet.  Einally,  the  ClientID  is  included  in  the 
timestamp  to  distinguish  similar  conditional- writes  from  different  clients. 

Definition  2.11  {replica  history,  s.ReplicaHistory).  If  a  server  s  accepts  a  candidate,  it  places  the 
candidate  in  its  replica  history  s.ReplicaHistory. 

Definition  2.12  {object  history  set,  ObjectHistorySet).  Clients  read  replica  histories  from 
servers.  Clients  store  replica  histories  from  servers  in  an  array  called  the  object  history  set 
{ObjectHistorySet).  The  object  history  set  is  indexed  by  server,  e.g.,  ObjectHistorySet[s]  is  equal 
to  the  last  replica  history  returned  from  server  s  (i.e.,  s.ReplicaHistory). 

Definition  2.13  {initial  candidate).  There  is  a  well  known  initial  candidate:  (0,  0,  _L).  All  servers 
initialize  their  replica  history  to  the  initial  value. 

2.2  Candidates,  constraints,  and  classification 

In  this  section  we  present  the  constraints  placed  on  quorum  systems  by  the  R/CW  protocol.  We 
define  established  and  potential  candidates.  Based  on  the  definitions  of  established  and  potential 
candidates,  we  develop  Definition  2.21  and  Definition  2.22  which  define  the  quorum  intersection 
properties  necessary  to  provide  read/conditional- write  semantics. 

Definition  2.14  {established  candidate).  An  established  candidate  is  accepted  at  all  of  the  benign 
servers  in  some  quorum.  Note  a  subset  of  servers  in  a  quorum  may  be  malevolent  and  we  cannot 
specify  what  “accept”  means  at  such  servers. 

Definition  2.15  {repairable  sets).  We  extend  the  quorum  system  definition  to  include  repairable 
sets.  Each  quorum  Q  €  Q  defines  a  set  of  repairable  sets  IZ{Q)  C  2^ . 
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Definition  2.16  {classifying  a  candidate  complete).  A  candidate  is  classified  complete  if,  given  a 
set  of  server  responses  S,  a  quorum  of  servers  share  a  common  candidate: 

3(5  ^  Q  ■  Q  fL  S  ^  complete. 

Definition  2.17  {classifying  a  candidate  repairable).  A  candidate  is  classified  repairable  if,  given 
a  set  of  server  responses  S,  a  repairable  set  share  a  common  candidate  and  that  candidate  is  not 
classifiable  as  complete: 

(V(5  ^  Q  :  Q  ^  S)  A  {3Q  G  Q,  3R  G  Tl{Q)  :  i?  C  5)  ^  repairable. 

Definition  2.18  {classifying  a  candidate  incomplete).  A  candidate  is  classified  incomplete  if  it  is 
not  classifiable  as  complete  or  repairable. 

Definition  2.19  {potential  candidate).  A  potential  candidate  is  accepted  at  all  of  the  benign  servers 
in  some  repairable  set. 

Definition  2.20  {quorum  size  and  asynchrony).  To  ensure  that  conditional-writes  may  complete 
in  an  asynchronous  system, 

VQ  G  Q,Vr  eT  :QUT  CU. 

Definition  2.21  {Established  candidate  intersect  potential  candidate).  We  restrict  the  system  such 
that  an  established  candidate  must  intersect  a  potential  candidate  at  at  least  one  benign  server: 

yQi,Qj  eQ,yBe  B,yR  G  TZ{Qj)  :  ft  n  i?  g  B. 

Definition  2.22  {established  candidates  classified  as  repairable).  We  restrict  the  system  such  that 
some  repairable  set  of  an  established  candidate  fully  intersects  every  other  quorum;  this  ensures 
that  an  established  candidate  is  classified  as  repairable  or  complete: 

VQi,  Qj  GQ,yBG  B,  3R  G  7^(g^)  :  i?  C  ft  n  Qft  5. 

Definition  2.23  {candidate,  value  candidate,  barrier  candidate) .  To  clarify  terminology  used  in  the 
remainder  of  the  proof,  we  may  discuss  established/potential  candidates  (which  may  be  values  or 
barriers),  established/potential  value  candidates  (which  are  not  barriers),  and  established/potential 
barrier  candidates  (which  are  not  values). 

Observation  2.24.  An  established  candidate  may  be  classified  as  complete  or  repairable,  but 
never  incomplete.  A  potential  candidate  (that  is  not  established)  may  be  classified  as  repairable 
or  incomplete,  but  never  complete.  A  candidate  that  is  classified  as  complete  is  established.  A 
candidate  that  is  classified  as  repairable  may  or  may  not  be  established.  A  candidate  that  is 
classified  as  incomplete  is  not  established. 

2.3  Classification  tuple 

In  this  section  we  define  the  classification  tuple  which  is  based  on  an  object  history  set.  The 
classification  tuple  is  a  summary  of  the  object  history  set  that  indicates  the  latest  value  candidate, 
latest  barrier  candidate,  and  type  of  conditional-write  that  must  be  performed. 
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Definition  2.25  {classification  tuple,  rcw_classify).  The  function  rcw_classify  identifies  the 
classification  tuple  of  a  given  ObjectHistorySet  (see  Figure  3): 

{CWType,  LatestCandidate,  LatestBarrier)  :=  jc\ass\fy {ObjectHistorySet) 

Classification  is  performed  by  identifying  the  candidate  with  the  latest  timestamp,  the  value  can¬ 
didate  with  the  latest  timestamp  that  is  classified  either  repairable  or  complete,  and  the  barrier 
candidate  with  the  latest  timestamp  that  is  classified  either  repairable  or  complete.  This  informa¬ 
tion  determines  which  type  of  conditional-write  must  be  performed.  The  terms  in  the  classification 
tuple  summarize  the  results  of  classification. 

CWType  is  the  type  of  conditional-write  to  perform. 

LatestCandidate  is  the  latest  value  candidate. 

LatestBarrier  is  the  latest  barrier  candidate. 

Lemma  2.26.  CWType  returned  by  raw  classify  is  in  the  set  {write,  inline_write,  barrier, 

INLINE_BARRIER,  COPY}. 

Proof.  This  lemma  is  trivially  true  (see  the  pseudo-code). 

Definition  2.27  {conditioned  on).  We  refer  to  the  object  history  set  for  a  conditional- write  as 
the  conditioned- on  ObjectHistorySet .  We  also  refer  to  the  LatestCandidate  of  the  conditioned-on 
ObjectHistorySet  (determined  from  classification)  as  the  conditioned-on  value  candidate.  Note 
that  a  candidate  with  logical  timestamp  LT  is  conditioned  on  the  object  history  set  with 
hash{ObjectHistorySet)  =  LT  .OHS  Verifier . 


2.4  Conditional-write  definitions 


In  this  section,  specific  types  of  conditional-writes  are  defined.  We  develop  the  safety  guarantees  in 
terms  of  these  types  of  conditional- writes.  Servers  can  determine  which  type  of  conditional- write 
must  be  performed  exclusively  based  on  the  ObjectHistorySet.  The  server  constructs  the  candidate 
to  accept  (i.e.,  the  tuple  {LT,  LT coi  Data))  based  on  the  classification  of  the  conditioned-on  object 
history  set  and  the  data  value  passed  in  (if  it  is  a  write). 


Definition  2.28 
Definition  2.29 
Definition  2.30 
Definition  2.31 
Definition  2.32 


{write).  'write{Data,  ObjectHistorySet)  conditionally-writes  the  value  Data, 
{inline _write ) .  inline_write(T,  ObjectHistorySet)  repairs  LatestCandidate. 
{barrier).  barrier(T,  ObjectHistorySet)  conditionally-writes  a  barrier. 
{inline _barrier) .  inline_barrier(T,  ObjectHistorySet)  repairs  LatestBarrier 
{copy).  copy(T,  ObjectHistorySet)  copies  forward  LatestCandidate. 


2.5  Conditional-write  pre-conditions 

In  this  section,  we  define  the  server  validation  that  is  performed  before  accepting  a  conditional- 
write  candidate.  For  each  type  of  conditional-write,  the  necessary  pre-conditions  are  identified.  Let 
CT  =  rcw  jclassHy  {ObjectHistorySet) . 

Tcw  Aatest  _tun.e{ObjectHistorySet) 

100:  return  {m.a-x.{ObjectHistorySet\U].ReplicaHistory.LT)) 


5 


Definition  2.33  {current  time,  LTcmrent)-  The  current  time  TTcurrent  for  a  ObjectHistorySet  is 
defined  as: 

CT .LatestCandidate.LT  if  CT .CWType  =  write, 

rcw_latest_tinie{ObjectHistorySet)  if  CT  .CWType  =  barrier, 

TTcurrent  =  <  CT  .LatestCandidate.LT  if  CT  .CWType  =  inline_write, 

CT  .LatestBarrier.LT  if  CT  .CWType  =  inline_barrier, 

CT.LatestBarrier.LT  if  CT  .CWType  =  COPY. 

The  current  time  identifies  the  logical  timestamp  prior  to  which  the  server’s  replica 
history  does  not  affect  classification.  Note  that  rcw  .latest  _time  is  defined  as 
ma-x.{ObjectHistorySet[U].ReplicaHistory.LT  in  the  pseudo-code  (See  Figure  1100  on  page  18). 

Definition  2.34  {current).  For  benign  server  s,  if  inax{s.ReplicaHistory.LT)  <  TTcurrent  then 
current  {ObjectHistorySet)  :=  true. 

Otherwise,  current{ObjectHistorySet)  :=  false. 

Observation  2.35.  Validating  authenticators  is  one  aspect  of  server  validation.  If  an  HMAC  in  an 
authenticator  does  not  validate  for  ObjectHistorySet[s]  then  the  server  ignores  ObjectHistorySet[s] 
(i.e.,  the  server  sets  ObjectHistorySet[s]  :=  {(0,  0,  T)}).  As  such,  authenticator  validation  can  only 
impact  whether  or  not  an  ObjectHistorySet  is  current  or  not. 

Lemma  2.36.  Benign  server  s  accepts  write{Data,  ObjectHistorySet)  only  if  CT  .CWType  = 
write  and  current{ObjectHistorySet)  =  true. 

Proof.  See  pseudo-code  for  pre-condition. 

Lemma  2.37.  Benign  server  s  accepts  inline  _write{L,  ObjectHistorySet)  only  if  CT  .CWType  = 
INLINE.WRITE  and  current{ObjectHistorySet)  =  true. 

Proof.  See  pseudo-code  for  pre-condition. 

Lemma  2.38.  Benign  server  s  accepts  barrier{L,  ObjectHistorySet)  only  if  CT  .CWType  = 
BARRIER  and  current{Object  History  Set)  =  true. 

Proof.  See  pseudo-code  for  pre-condition. 

Lemma  2.39.  Benign  server  s  accepts  inline _barrier{L,  ObjectHistorySet)  only  if 
CT  .CWType  =  inline_barrier  and  current{ObjectHistorySet)  =  true. 

Proof.  See  pseudo-code  for  pre-condition. 

Lemma  2.40.  Benign  server  s  accepts  copy{fL,  ObjectHistorySet)  only  if  CT  .CWType  =  copy 
and  current{ObjectHistorySet)  =  true. 

Proof.  See  pseudo-code  for  pre-condition. 
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2.6  Conditional-write  post-conditions 

In  this  section,  we  list  the  post-conditions  for  conditional-writes  that  establish  candidates.  The 
post-conditions  are  given  in  terms  of  an  object  history  set  comprised  of  the  replica  histories  of  the 
benign  servers  that  accepted  the  conditional-write  candidate.  The  post-condition  takes  effect  as 
soon  as  the  candidate  is  established  (cf.  Definition  2.14). 

Definition  2.41  {post- conditions,  ObjectHistorySet').  We  define  ObjectHistorySet'  to  be  the  ob¬ 
ject  history  set  comprised  of  the  replica  histories  returned  from  sufficient  benign  servers  that 
accept  conditional-write  candidates  to  establish  the  candidate  being  conditionally-written.  In 
the  case  of  inline_write  and  inline  .barrier  conditional- writes,  ObjectHistorySet'  includes  the 
replica  histories  of  the  benign  servers  that  accepted  write  candidates  and  barrier  candidates 
respectively.  Also  within  this  section,  we  define  CT  =  rcw_classify(06ject4ristorj/5et)  and 
CT'  =  YC'w  jclassHy  {ObjectHistorySet') . 

Lemma  2.42.  If  write{Data,  ObjectHistorySet)  is  established  yielding  ObjectHistorySet' ,  then  let¬ 
ting  X  =  CT' .LatestCandidate,  we  have  that: 

X.Data  =  Data; 

X.LT  .Time  =  raw  _latest_time{ObjectHistory  Set).  Time  -\-  1; 

X.LT  .Barrier Flag  =  false; 

X.LT  .ClientID  =  ClientID; 

X.LT  .DataVerifier  =  hash{Data); 

X.LT  .OHSVerifier  =  hash{ObjectHistorySet) ; 

X.LT CO  =  CT .  LatestCandidate.  LT . 

Proof.  Look  at  the  pseudo-code.  Note  that,  because  of  the  pre-conditions  on  a  write,  X.LT.  Time  = 
CT  .LatestCandidate.  Time  -|-  1  is  also  true. 

Lemma  2.43.  If  inline  write{±,  ObjectHistorySet)  is  established  yielding  ObjectHistorySet' ,  then 
CT'  .LatestCandidate  =  CT  .LatestCandidate. 

Proof.  Look  at  the  pseudo-code. 

Lemma  2.44.  If  barrier{±,  ObjectHistorySet)  is  established  yielding  ObjectHistorySet' ,  then, 
CT'  .LatestCandidate  =  CT  .LatestCandidate.  Letting  X  =  CT'  .LatestBarrier ,  we  have  that: 
X.Data  =  T; 

X.LT  .Time  =  rcw_latest_time{ObjectHistorySet).Time  1; 

X.LT  .BarrierFlag  =  true; 

X.LT  .ClientID  =  ClientID; 

X.LT  .DataVerifier  =  T; 

X.LT  .OHSVerifier  =  hash{ObjectHistorySet)  ; 

X.LT  CO  =  CT . LatestCandidate. LT . 

Proof.  Look  at  the  pseudo-code. 

Lemma  2.45.  If  inline-barrier{L,  ObjectHistorySet)  is  established  yielding  ObjectHistorySet' , 
then  CT'  .LatestBarrier  =  CT  .LatestBarrier  and  CT'  .LatestCandidate  =  CT  .LatestCandidate. 

Proof.  Look  at  the  pseudo-code. 
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Lemma  2.46.  If  copy{±,  ObjectHistorySet)  establishes  a  eandidate  yielding  ObjeetHistorySet' , 
then,  letting  X  :=  CT' .LatestCandidate,  we  have  that: 

X.Data  =  CT  .LatestCandidate. Data; 

X.LT  .Time  =  rcwJatestXime{ObjeetHistorySet).Time  +  1; 

X.LT  .BarrierFlag  =  false; 

X.LT  .ClientID  =  ClientID ; 

X  .LT  .DataVerifier  =  hash{X.Data); 

X.LT.OHSVerifier  =  hash{ObjeetHistorySet) ; 

X.LT CO  =  CT . LatestCandidate. LT . 

Proof.  Look  at  the  pseudo-code.  Note  that,  because  of  the  pre-conditions  on  copy,  X.LT  .Time  = 
CT .LatestBarrier .Time  -|-  1  is  also  true. 

2.7  Repair  conditional-writes 

In  this  section,  we  consider  inline_write  and  inline _barrier.  We  show  that  for  an  execution 
that  includes  inline_write  and  inline_barrier  conditional-writes,  there  exists  an  execution  that 
does  not  include  such  operations  that  has  identical  server-side  state  transitions.  Server-side  state 
transitions  refers  to  the  state  maintained  by  the  server  (i.e.,  its  replica  history  and  corresponding 
authenticator).  As  such,  the  remainder  of  the  safety  proof  focuses  on  write,  barrier,  and  copy 
conditional- writes. 

In  an  obvious  variation  of  the  protocol,  inline_write  and  inline  .barrier  conditional-writes 
are  not  employed:  they  are  both  replaced  with  a  barrier  followed  by  a  copy.  The  inline.write 
and  inline  .barrier  conditional- writes  are  included  in  the  proof  because  they  are  included  in  our 
implementation  of  the  R/CW  protocol.  In  practice,  such  repairs  reduce  read/conditional- write 
contention,  since  a  read  initiating  such  repair  does  not  contend  with  the  conditional-write  it  repairs. 

Lemma  2.47.  For  every  inline_write  performed,  there  exists  an  exeeution  in  whieh  the  server-side 
state  transition  is  identieal,  but  in  whieh  the  inline_write  was  not  performed. 

Proof.  Lemmas  2.42  and  2.43  show  that  the  post-conditions  (i.e.,  the  server-side  state  transitions) 
for  a  write  and  inline.write  are  identical.  From  lemmas  2.36  and  2.37  we  note  that  the  pre¬ 
condition  for  an  inline.write  differs  from  that  of  a  write  only  in  the  existence  of  the  value 
candidate  being  repaired.  As  such,  there  exists  another  execution  in  which  any  server-side  transition 
resulting  from  the  acceptance  of  an  inline.write  is  due  to  the  acceptance  of  a  write. 

Lemma  2.48.  For  every  inline_barrier  performed,  there  exists  an  exeeution  in  whieh  the  server- 
side  state  transition  is  identieal,  but  in  whieh  the  inline  ^barrier  was  not  performed. 

Proof.  Lemmas  2.44  and  2.44  show  that  the  post-conditions  (i.e.,  the  server-side  state  transitions) 
for  a  barrier  and  inline  .barrier  are  identical.  From  lemmas  2.38  and  2.39  we  note  that  the 
pre-condition  for  an  inline  .barrier  differs  from  that  of  a  barrier  only  in  the  existence  of  the  value 
candidate  being  repaired.  As  such,  there  exists  another  execution  in  which  any  server-side  transition 
resulting  from  the  acceptance  of  an  inline  .barrier  is  due  to  the  acceptance  of  a  barrier. 

2.8  Write-CW  segments 

In  this  section,  we  define  a  write-cw  segment.  In  the  subsequent,  we  define  a  copy-CW  segment. 
Then,  we  define  a  copy-CW  segment-chain.  From  the  properties  of  write-CW  segments  and  copy- 
CW  segment-chains,  we  demonstrate  that  all  established  value  candidates  are  in  the  eondition-on 
ehain. 


Definition  2.49  {segment,  iTbegin?  A  segment  is  a  logical  timestamp  interval.  The  logical 

timestamp  that  begins  a  segment  is  denoted  TTbegin-  The  logical  timestamp  that  ends  a  segment 
is  denoted  LT end- 

Definition  2.50  {write-CW  segment).  Every  write  that  establishes  a  candidate  defines  a  write- 
CW  segment.  Consider  established  value  candidate  Candidate  written  by  a  write.  The  write-CW 
segment  defined  by  Candidate  has  LT end  =  Candidate. LT  TTbegm  =  Candidate. LT qq. 

Lemma  2.51.  For  every  write-CW  segment  in  an  exeeution,  there  are  no  potential  barrier  eandi- 
dates  with  a  timestamp  in  the  range  [TTbegiiu  TTgnd]- 

Proof.  Consider  the  pre-conditions  for  a  write  (cf.  Lemma  2.36).  Because  of  the  pre-conditions, 
when  a  benign  server  accepts  a  write-CW  candidate  Candidate,  the  server  has  no  history  candidates 
between  Candidate. LT qq  (TTbegin)  and  Candidate. LT  {LTend)-  Moreover,  because  benign  servers 
only  accept  candidates  conditioned-on  a  current  object  history  set,  such  servers  never  accept  a 
candidate  in  the  range  [TTbegin,  LTend\  at  a  later  point  in  the  execution.  Because  of  the  intersection 
property  between  established  candidates  and  potential  candidates  (cf.  Definition  2.21),  there  cannot 
exist  a  potential  barrier  with  a  timestamp  in  the  range  [TTbegin,  LTend\- 

Lemma  2.52.  For  eaeh  write-CW  segment  in  in  an  exeeution  there  are  no  potential  value  eandi- 
dates  with  a  timestamp  in  the  range  (TTbegin,  TTend)- 

Proof.  This  proof  is  similar  to  the  proof  for  Lemma  2.51.  However,  there  is  a  distinction  between 
the  range  in  which  there  are  no  potential  barrier  candidates  and  in  which  there  are  no  potential 
values.  By  definition,  there  is  an  established  (and  therefore  potential)  value  candidate  at  TTbegin 
and  LTend- 

2.9  Copy-CW  segments 

The  write  is  expected  to  be  the  “common  case”  in  the  R/CW  protocol.  The  copy  covers  the 
“corner  case”  in  which  there  is  contention.  Note  that  not  all  copy-CW  segments  are  of  interest; 
only  those  that  are  part  of  a  copy-CW  segment-chain  are  of  interest.  We  define  the  copy-CW 
segment-chain  in  the  subsequent. 

Definition  2.53  (  eopy-CW  segment).  Every  copy  that  establishes  a  candidate  defines  a  copy- 
CW  segment.  Consider  established  value  candidate  Candidate  written  by  a  copy.  The  eopy-CW 
segment  defined  by  Candidate  has  LTend  =  Candidate. LT  TTbegin  =  Candidate. LT qq. 

Lemma  2.54.  For  every  eopy-CW  segment  in  an  exeeution,  there  exists  at  least  one  established 
barrier  eandidate  with  timestamp  in  the  range  {LT y^egin,  LTend) - 

Proof.  Consider  the  pre-conditions  for  a  copy  (cf.  Lemma  2.40).  The  pre-conditions  require  there 
to  be  an  established  barrier  candidate  before  a  benign  server  will  accept  a  copy  candidate.  Since 
the  potential  value  candidate  that  defines  the  copy-CW  segment  is  accepted  at  at  least  one  benign 
server,  the  pre-conditions  are  true,  and  there  exists  an  established  barrier  candidate  with  timestamp 
in  the  range  (TTbegin,  LTend)- 

Lemma  2.55.  For  eaeh  eopy-CW  segment  in  an  exeeution,  there  are  no  established  value  eandidates 
with  a  timestamp  in  the  range  {LT  y,egm,  LT  end) - 
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Proof.  We  show  that  the  potential  value  candidate  that  defines  the  copy-CW  segment  precludes 
an  established  value  candidate  in  the  range  (-LTbegiru -^7’end)-  Consider  the  pre-conditions  for 
a  copy  (cf.  Lemma  2.40)  and  the  intersection  property  between  established  and  potential  can¬ 
didates  (cf.  Definition  2.21).  If  an  established  value  candidate  exists  with  a  timestamp  in  the 
range  (LTbegin, -^^end);  classification  would  identify  it  as  the  conditioned-on  value  candidate  since 
Lemma  2.22  ensures  that  an  established  value  candidate  is  always  classified  as  repairable.  Since  the 
conditioned-on  value  candidate  has  timestamp  LTbegim  there  does  not  exist  an  established  value 
candidate  with  timestamp  in  the  range  (LTbegin;  -^^end)-  The  value  candidate  that  is  conditioned- 
on  and  the  value  candidate  that  ends  the  copy  segment  may  both  be  established,  which  is  why  the 
range  is  (LTbegin,  TTgnd)  and  not  [LTbegin,  TTgnd]- 

2.10  Copy-CW  segment-chains 

In  this  section  we  define  a  copy-CW  segment-chain.  Such  a  chain  consists  of  one  or  more  copy- 
CW  segments:  the  first  segment  in  the  chain  begins  with  an  established  value  candidate  and  the 
final  segment  in  the  chain  ends  with  an  established  value  candidate.  All  segments  in  between  the 
established  value  candidates  begin  and  end  with  a  potential,  but  not  established,  value  candidate. 
Not  all  copy-CW  segments  that  exist  in  the  timestamp  interval  between  the  two  established  value 
candidates  that  define  the  copy-CW  segment-chain  need  be  in  the  copy-CW  segment-chain. 

Definition  2.56  (  opening  copy-CW  segment).  A  copy-CW  segment  that  begins  with  an  established 
value  candidate  is  called  an  opening  copy-CW  segment. 

Definition  2.57  {terminating  copy-CW  segment).  A  copy-CW  segment  that  ends  with  an  estab¬ 
lished  value  candidate  is  called  a  terminating  copy-CW  segment. 

Definition  2.58  {copy-CW  segment- chain).  A  copy-CW  segment-chain  consists  of  an  opening 
copy-CW  segment,  a  terminating  copy-CW  segment,  and  a  finite  number  (possibly  zero)  of  copy- 
CW  segments. 

Observation  2.59.  A  copy-CW  segment  may  be  both  an  opening  and  terminating  copy-CW 
segment  and  so  a  copy-CW  segment-chain  may  consist  of  a  single  copy-CW  segment. 

Lemma  2.60.  For  every  non-opening  copy-CW  segment  X  in  a  copy-CW  segment  chain,  there 
exists  exactly  one  other  copy-CW  segment  Y  in  the  copy-CW  segment  chain,  such  that  A.LTbegin  = 
Y.LT(,Yid- 

Proof.  The  pre-condition  for  a  copy  (see  Lemma  2.40)  and  the  post-conditions  for  a  copy  (see 
Lemma  2.46)  ensure  this. 

Lemma  2.61.  For  every  non-terminating  copy-CW  segment  X  in  a  copy-CW  segment-chain,  there 
exists  exactly  one  other  copy-CW  segment  Y  in  the  copy-CW  segment- chain,  such  that  A.LTgnd  = 
T.  LTbegin- 

Proof.  The  pre-condition  for  a  copy  (see  Lemma  2.40)  and  the  post-conditions  for  a  copy  (see 
Lemma  2.46)  ensure  this. 

Definition  2.62  {copy-CW  segment-chain  begin  and  end).  A  copy-CW  segment  chain  begins  at 
TTbegin  of  its  Opening  copy-CW  segment  and  ends  at  LTend  of  its  terminating  copy-CW  segment; 
as  such,  the  copy-CW  segment-chain  has  a  LTbegin  and  LTgnd  as  well. 

Lemma  2.63.  For  each  copy-CW  segment-chain  in  an  execution,  there  are  no  established  value 
candidates  with  a  timestamp  in  the  range  (LTbegin,  TTgnd)- 


10 


Proof.  Lemma  2.55  ensures  that  there  are  no  established  value  candidates  within  the  timestamp 
interval  of  each  copy-CW  segment  in  a  copy-CW  segment-chain.  Lemmas  2.60  and  2.61  ensure  that 
there  are  only  copy-CW  segments  in  the  timestamp  interval  of  the  copy-CW  segment-chain.  There¬ 
fore,  there  are  no  established  value  candidates  with  a  timestamp  in  the  range  (LTbegiru  -^^’end)- 

Observation  2.64.  Once  a  barrier  candidate  is  established,  the  only  way  a  value  candidate  with 
a  higher  timestamp  can  be  established  is  due  to  a  copy.  Until  there  is  a  terminating  segment,  the 
set  of  segments  in  the  segment-chain  may  be  unknown.  Multiple  potential  value  candidates  can 
condition  on  the  established  value  candidate  with  timestamp  LTbegin-  Therefore,  there  can  exist 
multiple  “cycles”  of  established  barriers  followed  by  multiple  potential  value  candidates.  In  each  of 
these  “cycles”  the  set  of  potential  value  candidates,  that  result  from  copy  conditional- writes,  is  a 
subset  of  the  potential  value  candidates  that  precede  the  established  barrier  candidate  and  succeed 
LTbegin-  Once  there  is  a  terminating  segment,  membership  in  the  segment-chain  is  determined. 

2.11  The  condition-on  chain 

In  this  section  we  show  that  from  the  latest  established  value  candidate  in  an  execution  back  to 
logical  time  0,  there  exists  a  continuous  chain  of  write-CW  segments  and  copy-CW  segment-chains. 
We  refer  to  this  chain  as  the  condition-on  chain.  In  this  section  we  use  the  term  segment  to  refer 
to  either  a  write-CW  segment  or  a  copy-CW  segment-chain. 

Definition  2.65  {eondition-on  ehain).  The  condition-on  chain  for  an  execution  is  comprised  of  the 
latest  established  value  candidate  in  the  execution  as  well  as  each  candidate  found  by  repeatedly 
traversing  condition  on  timestamps. 

Lemma  2.66.  Every  established  value  eandidate  is  in  the  eondition-on  ehain. 

Proof.  Consider  an  execution  in  which  there  is  a  single  established  value  candidate.  By  definition, 
this  single  established  value  candidate  is  (0,  0,  T).  Now,  consider  an  execution  in  which  there  are 
two  established  value  candidates.  Clearly,  one  is  (0,  0,  T)  and  the  other  is  the  latest  established 
value  candidate.  Following  the  condition  on  timestamp  of  the  latest  established  value  candidate 
leads  to  (0,  0,  T).  Lemma  2.52  proves  that  there  are  no  potential  or  established  value  candidates 
within  the  timestamp  interval  of  a  write-CW  segment,  but  that  there  is  an  established  value  can¬ 
didate  at  the  beginning  and  at  the  end.  Lemma  2.63  proves  that  there  are  no  established  value 
candidates  within  the  timestamp  interval  of  a  copy-CW  segment-chain,  but  that  there  is  an  estab¬ 
lished  value  candidate  at  the  beginning  and  at  the  end.  Since  the  latest  established  value  candidate 
is  either  part  of  a  write-CW  segment  chain  or  a  copy-CW  segment-chain,  and  since  there  are  only 
two  established  value  candidates,  then  both  established  value  candidates  are  in  the  condition-on 
chain.  Now,  consider  an  execution  in  which  there  are  x  established  value  candidates.  The  latest 
established  value  candidate  (i.e.,  the  established  value  candidate)  is  part  of  either  a  write-CW 
segment  or  a  copy-CW  segment-chain  that  begins  with  the  next  latest  established  value  candidate 
(i.e.,  the  (x  —  1)**).  By  induction  then,  all  established  value  candidates  are  in  the  condition-on 
chain  (and  the  condition-on  chain  ends  at  (0,  0,  T)). 

2.12  Linearizable  reads  and  conditional-writes 

Intuitively,  linearizability  [5]  requires  that  each  read  return  a  value  consistent  with  some  execution 
in  which  each  read  and  write  is  performed  at  a  distinct  point  in  time  between  when  the  client 
invokes  the  operation  and  when  the  operation  returns.  For  the  purposes  of  the  R/CW  protocol,  we 
consider  the  linearizability  of  reads  and  conditional- writes  (rather  than  reads  and  writes). 
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The  condition-on  chain  that  results  from  write-CW  segments  and  copy-CW  segment-chains 
induces  a  total  order  on  all  established  value  candidates.  This  total  order  is  sufficient  to  demonstrate 
the  lineariz ability  of  all  conditional-writes  that  establish  a  candidate.  Reads  are  shown  to  be 
partially  ordered  by  the  established  value  candidate  they  return  and  to  obey  their  real-time  ordering 
relation.  Since  multiple  distinct  reads  may  return  the  same  established  value  candidate,  the  total 
order  on  established  value  candidates  provides  only  a  partial  ordering.  Copy-CW  segment-chains 
that  literally  copy  an  established  value  candidate  may  induce  ordering  on  a  subset  of  reads  that 
return  the  same  “data”  (albeit  different  established  value  candidates).  The  partial  ordering  on 
reads  can  be  arbitrarily  extended  to  a  total  ordering,  and  thus  the  set  of  all  reads  that  return  a 
candidate  and  conditional-writes  that  established  is  linearizable. 

Lemma  2.67.  All  established  value  eandidates  are  totally  ordered  by  their  timestamps. 

Proof.  Lemma  2.66  proves  that  all  established  value  candidates  are  in  the  condition-on  chain  and 
that  the  condition-on  chain  totally  orders  all  established  value  candidates  by  timestamp. 

Definition  2.68  {eonditional-write  begins).  A  conditional-write  begins  once  a  benign  server  accepts 
a  conditional-write  candidate  that  corresponds  to  the  conditional-write. 

Definition  2.69  {eonditional-write  ends).  A  conditional- write  ends  once  the  candidate  correspond¬ 
ing  to  it  is  an  established  value  candidate.  As  such,  we  just  refer  to  a  conditional-write  that  is 
established. 

Definition  2.70  {read  begins).  A  read  begins  once  a  read  response  from  a  benign  server  is  received. 
Definition  2.71  {read  ends).  A  read  ends  once  it  returns  a  candidate. 

Lemma  2.72.  A  read  that  returns  a  eandidate,  returns  an  established  value  eandidate. 

Proof.  Look  at  the  pseudo-code;  repair  is  attempted  until  the  pre-conditions  for  a  write  are  met 
(i.e.,  until  an  established  value  candidate  is  latest). 

Lemma  2.73.  The  set  of  reads  by  benign  elients  in  an  exeeution  are  partially  ordered  by  the 
timestamp  of  the  established  value  eandidate  returned. 

Proof.  As  Lemma  2.67  proves,  all  established  value  candidates  in  the  execution  are  totally  ordered. 
The  total  order  on  established  value  candidates  induces  a  partial  order  on  all  reads  (which  return 
only  established  value  candidates,  due  to  Lemma  2.72).  For  a  read  to  return  a  candidate,  the 
candidate  must  be  established  before  the  read  ends  (due  to  the  classification  rules  Definition  2.17 
and  Definition  2.16).  It  does  not  matter  if  the  candidate  returned  by  the  read  was  established 
prior  to  the  read  beginning,  or  if  it  is  established  at  some  point  during  the  read.  In  either  case  the 
candidate  returned  by  the  read  is  consistent  with  the  a  partial  order. 

Observation  2.74.  Lemma  2.73  excludes  reads  by  malevolent  clients  because  malevolent  clients 
can  return  arbitrary  values  from  reads.  Indeed,  such  clients  need  not  even  issue  a  read  request 
before  returning  a,  potentially  forged,  candidate. 

Lemma  2.75.  In  an  exeeution,  eonditional-writes  that  establish  value  eandidates  and  the  values 
returned  by  reads  are  linearizable. 
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Proof.  Lemma  2.67  proves  that  all  established  value  candidates  are  totally  ordered  by  timestamp. 
This  ordering  is  consistent  with  the  real-time  ordering  of  the  conditional-write  operations  that 
yielded  the  established  value  candidates.  Lemma  2.73  proves  that  reads  are  partially  ordered  by 
the  timestamp  of  the  candidate  returned.  The  partial  ordering  can  be  totally  ordered  so  as  to 
be  consistent  with  the  real-time  ordering  of  the  read  operations.  As  such,  conditional-writes  that 
establish  value  candidates  and  the  values  returned  by  reads  are  linearizable. 

2.13  Liveness 

The  R/CW  protocol  as  described  provides  a  very  weak  liveness  guarantee,  namely  that  it  is  possible 
to  make  progress. 

Lemma  2.76.  It  is  possible  to  complete  a  conditional- write. 

Proof.  Given  any  object  history  set,  there  is  a  sequence  of  client  operations,  that  if  each  com¬ 
pletes,  allows  a  client  to  complete  a  conditional-write.  Notice  that  it  is  possible  that  a  client 
performs  the  following  chain  of  operations:  barrier,  inline  .barrier,  copy,  inline.write,  and, 
finally,  write.  All  possible  operations  are  in  this  chain  and  it  ends  with  a  conditional- write.  Since 
every  ObjectHistorySet  requires  one  of  barrier,  inline.barrier,  copy,  inline.write,  or  write 
(see  Lemma  2.26),  then  it  is  possible  from  any  system  state  to  perform  a  conditional- write. 

Unfortunately,  Lemma  2.76  does  not  guarantee  that  eventually  a  new  value  is  conditionally- 
written  to  the  system.  Malevolent  components  can  prevent  correct  clients  from  making  progress. 

Corollary  2.77.  In  a  benign  execution,  the  R/CW  protocol  is  obstruction-free  [4j. 

Proof.  In  the  absence  of  contention  from  other  clients  and  of  malevolent  components.  Lemma  2.76 
ensures  that  a  client  will  complete  a  conditional- write  in  a  finite  number  of  steps. 


3  Read/conditional- write  pseudo-code 

In  this  section  pseudo-code  for  the  read/conditional- write  (R/CW)  protocol  is  given.  The  pseudo¬ 
code  is  tailored  for  a  threshold  quorum  system  construction  described  in  the  following  section. 

3.1  Threshold  quorum  constraints 

In  this  section  we  develop  constraints  on  threshold  quorum  systems  that  meet  the  definitions  from 
Section  2.2.  This  construction  of  threshold  quorum  system  can  directly  be  applied  to  the  RT-system 
quorum  construction  in  [8]. 

Consider  a  threshold  quorum  system  in  which  all  Q  G  Q  are  of  size  q,  all  IZ{Q)  are  of  size  r,  all 
T  £  T  are  of  size  t,  all  B  £  B  are  of  size  b,  and  the  universe  is  of  size  n. 

From  Definition  2.20: 

q-\-t  <  n.  (1) 

From  Definition  2.21: 

n  <  q  +  r  —  b.  (2) 
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From  Definition  2.22 


n  +  r  +  b  <  2q. 

Combining  (1)  and  (2): 

q  +  t<n<q  +  r  —  b, 
t  +  b  <  r. 

Combining  (1)  and  (3): 

q  +  t<n<2q  —  r  —  b, 
r  +  t  +  b  <  q. 

Combining  (2)  and  (3): 

n  —  q  +  b<r<2q  —  n  —  b, 

2n  <  3q  —  2b, 

3q-2b 

n<^^. 

These  constraints  can  be  summarized  as: 


(3) 


(4) 


(5) 


(6) 


t  +  b  <  r] 
r  +  t  +  b<q-, 
q  +  t  <  n] 

n  <  min  q  +  r 


,  3q-2b' 


To  construct  a  threshold  quorum  parameterized  by  A  >  0: 


r  =  t  +  b  +  2A  +  l; 
q  =  2t  +  2b  +  2A  +  l  {=r  +  t  +  b); 
n  =  3t  +  2b  +  3A  +  1  {=  q  +  t  +  A). 


With  this  construction,  the  upper  bound  on  n  is  obeyed.  First,  q  +  r  —  b  =  3t  +  26  +  4A  +  2  >  n. 
Second,  52^  =  6t+4b+6A+3  _  3^  _l_  25  _|_  3^  +  1.5  >  n.  Since  n  is  bound  from  above  by  the 
greatest  throughput-scalability  that  can  be  achieved  via  threshold  quorums  is  1.5  x. 


3.2  Symbols  and  data  structures 

Symbols  used  in  the  pseudo-code  are  listed  in  Table  1.  Enumerations,  structures,  and  types  used 
in  the  pseudo-code  are  given  in  Figure  1. 
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Symbol 

Description 

s 

A  specific  server. 

U 

The  universe  of  all  servers. 

ObjectHistorySet 

An  object  history  set. 

ReplicaHistory 

A  replica  history. 

s  .ReplicaHistory 

The  replica  history  of  server  s. 

a 

An  authenticator  (array  of  HMACs). 

LT 

Logical  timestamp. 

LTqo 

Conditioned-on  logical  timestamp. 

0 

Well  known  initial  logical  timestamp. 

_L 

A  null  value;  sometimes  used  to  indicate  an  unused  argument/answer. 

Q 

A  quorum. 

Q 

The  quorum  system. 

n 

The  size  of  the  universe  of  servers 

q 

The  size  of  each  quorum  in  a  threshold  quorum  system;  threshold  for  classi¬ 
fying  a  candidate  complete. 

r 

The  size  of  each  repairable  set  for  a  quorum;  threshold  for  classifying  a  can¬ 
didate  repairable. 

t 

The  threshold  number  of  faulty  servers. 

b 

The  threshold  number  of  Byzantine  faulty  servers. 

Table  1.  Symbols  used  in  the  pseudo-code. 

200 

/*  Enumerations 

*/ 

201 

/*  Types  of  operations.  */ 

202 

CWType  G  {write,  inline_write,  barrier,  inline_barrier,  copy} 

203 

/*  Structures.  */ 

204 

/*  Logical  timestamps.  */ 

205 

LT  =  { 

206 

Time 

/*  Major  component  of  logical  time.  */ 

207 

BarrierFlag 

/*  Boolean  flag  indicating  barrier  or  value.  */ 

208 

ClientID 

/*  Client  ID.  */ 

209 

Data  Verifier 

/*  Hash  of  data  value.  */ 

210 

OHS  Verifier 

/*  Hash  of  conditioned-on  ObjeetHistorySet.  */ 

211 

} 

212 

/*  Candidate  (initialized  to  (0,  0,  T)).  */ 

213 

Candidate  =  { 

214 

LT 

/*  Logical  timestamp  of  candidate.  */ 

215 

LT  CO 

/*  Timestamp  of  conditioned-on  candidate.  */ 

216 

Data 

/*  Candidate’s  data.  */ 

217 

} 

218 

/*  Types.  */ 

219 

/*  Replica  history  is  an  ordered  set  of  candidates.  */ 

220 

ReplicaHistory  = 

{  Candidate} 

221 

/*  Anthenticator 

is  an  array  of  HMACs  indexed  by  server  (U  is  the  universe  of  servers).  */ 

222 

a  =  HMAC[[/] 

223 

/*  Object  history  set  is  an  array  of  replica  histories  indexed  by  server.  */ 

224 

ObjectHistorySet 

=  {ReplicaHistory,  a)  [U] 

Figure  1.  Enumerations,  structures,  and  types  for  the  R/CW  pseudo-code. 
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3.3  Client-side 


Client-side  functions  are  listed  in  Figure  2.  To  perform  a  read  a  null  object  history  set  (i.e.,  _L), 
is  sent  in  the  conditional- write  request  (cf.  linedOO).  In  the  pseudo-code  shown,  reads  retry  repair 
until  a  candidate  is  classified  complete;  they  could  abort  instead.  Conditional-writes  are  shown  to 
abort  if  they  do  not  establish  a  candidate;  they  could  retry  repair  instead.  The  quorum  probing 
policy  shown  in  c_rcw_quorum_rpc  is  inefficient.  A  more  efficient  approach  to  probing  for  a 
quorum  of  responses  is  implemented  in  the  prototype.  Functions  for  determining  the  classification 
tuple  of  an  object  history  set  are  listed  in  Figure  3. 
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c_rcw_initialize()  : 

300:  /*  The  version  history  returned  from  each  server  in  the  read  operation  is  kept  in  ObjectHistorySet .  */ 

301:  for  each  (s  £  U)  do 

302:  ObjectHistorySet[s].ReplicaHistory  :=  {(0,  0,  T)} 

303:  ObjectHistorySet[s].a  :=  T 

304:  end  for 

c_rcw_read()  : 

400:  {T,  ObjectHistorySet)  :=  c_rcw_quorum_rpc(T,  T)  /*  Passing  in  (T,T)  indicates  a  read  request.  */ 

401:  {C W Type,  Candidate,  1)  :=  rcw _c\assify (ObjectHistorySet) 

402:  if  (CWType  /  write)  then 
403:  /*  Perform  repair.  */ 

404:  ObjectHistorySet  :=  cjccw jrepair(ObjectHistorySet) 

405:  {CWType,  Candidate,  1)  :=  rcw Jilassify (ObjectHistorySet) 

406:  /*  Since  repair  returned,  CWType  =  write.  */ 

407:  end  if 

408:  return  ((SUCCESS,  Candidate. Data)) 

cjrcwjwrite(Data,  ObjectHistorySet):  /*  write.  */ 

500:  (Order,  ObjectHistorySet)  :=  c_rcw_quorum_rpc(Zlata,  ObjectHistorySet) 

501:  if  (Order  >  q)  then 
502:  return  ((success,  T)) 

503:  end  if 

504:  /*  Otherwise,  perform  repair  and  then  retry.  */ 

505:  ObjectHistorySet  :=  c^rcw jrepair (ObjectHistorySet) 

506:  return  (c_rcw_write(Z3ata,  ObjectHistorySet)) 

cjrcwjrepa\.r(ObjectHistorySet):  /*  inline_write,  barrier,  inline_barrier,  and  copy.  */ 

600:  repeat 

601:  backoff()  /*  Backoff  to  avoid  livelock.  */ 

602:  /*  Perform  a  barrier  or  copy  (depends  on  ObjectHistorySet) .  */ 

603:  (T,  ObjectHistorySet)  :=  c_rcw_quorum_rpc(T,  ObjectHistorySet) 

604:  (CWType,  Candidate,  1.)  :=  rcw _classify (ObjectHistorySet) 

605:  until  (CWType  =  write) 

606:  return  (ObjectHistorySet) 

c_rcw_quorum_rpc(Z)ata,  ObjectHistorySet)  : 

700:  ResponseSet  :=  0 
701:  Count  :=  0 
702:  repeat 

703:  /*  Eliding  probing  policy.  For  simplicity  broadcast  to  all  servers.  */ 

704:  for  each  (s  &  U  \  ResponseSet. s)  do 

705:  send(s,  Data,  ObjectHistorySet) 

706:  end  for 

707:  if  (poll()  =  true)  then 

708:  (s.  Status,  (ReplicaHistory ,  a))  receive() 

709:  if  (s  ^  ResponseSet. s)  then 

710:  ObjectHistorySet[s]  :=  (ReplicaHistory,  a) 

711:  ResponseSet  ResponseSet  U  (s) 

712:  end  if 

713:  if  (Status  =  SUCCESS)  then 

714:  Count  :=  Count  +  1 

715:  end  if 

716:  end  if 

717:  until  (3Q  C  ResponseSet  :  Q  G  Q) 

718:  return  ({Count,  ObjectHistorySet)) 


Figure  2.  Client-side  R/CW  pseudo-code. 
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rcw_classify(  ObjectHistorySet): 

800:  /*  Get  latest  object  version,  barrier  version,  and  timestamp.  */ 

801:  LatestCandidate  :=  rcw_latest_candidate(06jectffisto?T/>S'et,  false) 

802:  LatestBarrier  :=  rcw Aatest _candidate{ObjectHistory Set,  true) 

803:  iTiatest  :=  rcw_latest_time{ObjectHistorySet) 

804:  /*  Determine  which  type  of  operation  to  perform.  */ 

805:  if  (Z/Tiatest  =  LatestCandidate. LT)  A  {rcw _order{LatestCandidate,  ObjectHistorySet)  >  q)  then 
806:  CWType  :=  write 

807:  else  if  (Z/Tiatest  =  LatestCandidate. LT)  A  {rcw-order{LatestCandidate,  ObjectHistorySet)  >  r)  then 
808:  CWType  :=  inline_write 

809:  else  if  (Z/Tiatest  =  LatestBarrier  .LT)  A  {rcw  jorder{LatestBarrier ,  ObjectHistorySet)  >  q)  then 
810:  CWType  :=  COPY 

811:  else  if  (Z/Tiatest  =  LatestBarrier  .LT)  A  {rcw  _order{LatestBarrier ,  ObjectHistorySet)  >  r)  then 
812:  CWType  :=  inline_barrier 

813:  else 

814:  CWType  :=  barrier 

815:  end  if 

816:  return  {{CWType,  LatestCandidate,  LatestBarrier)) 
rcw_latest_candidate( ObjectHistorySet,  BarrierFlag) 

900:  CandidateSet  :=  {Candidate  :  {rcw -order{Candidate,  ObjectHistorySet)  >  r)A 
901:  {Candidate. LT  .BarrierFlag  —  BarrierFlag)} 

902:  Candidate  :=  {Candidate  :  {Candidate  €  CandidateSet)  A  {Candidate. LT  —  max{CandidateSet.LT))) 
903:  return  {Candidate) 

rcw _order(  Candidate,  ObjectHistorySet)  : 

1000:  return  (|{s  £  U  :  Candidate  £  ObjectHistorySet[s].ReplicaHistory}\) 
rcw -latest  _time(  ObjectHistorySet) 

1100:  return  {ma^{ObjectHistorySet\U].ReplicaHistory.LT)) 


Figure  3.  R/CW  classification  of  an  object  history  set. 
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3.4  R/CW  server  pseudo-code 

Server-side  functions  are  listed  in  Figure  4.  As  presented,  replica  histories  include  data  for  every 
candidate:  this  is  inefficient.  In  practice,  the  data  in  a  candidate  is  treated  specially.  Only  the 
data  for  the  latest  value  candidate  is  included  in  the  replica  history  returned  to  clients.  Even  then, 
it  is  not  included  in  responses  to  a  write  or  inline_write  since  the  client  knows  the  data  value  it 
sent  the  server.  If  the  client  needs  a  previous  candidate’s  data,  it  can  request  it  via  its  timestamp. 

To  be  safe  in  the  crash  recovery  failure  model,  servers  must  update  their  state  in  an  atomic 
step  after  accepting  a  candidate.  Also  due  to  the  crash-recovery  failure  model,  clients  re-send 
requests  until  they  receive  a  response.  Thus,  a  server  may  receive  repeated  requests.  To  handle 
repeated  requests,  a  server  checks  its  replica  history  to  determine  if  it  has  already  accepted  the 
requested  candidate  (line  1315).  To  reclaim  storage  space,  servers  can  prune  their  replica  histories 
(lines  1324-1328).  Unfortunately,  if  storage  space  is  reclaimed,  it  is  not  always  possible  for  servers 
to  determine  if  they  have  accepted  a  candidate.  As  such,  there  is  (in  theory)  a  tension  between 
bounded  storage  capacity  and  client’s  being  able  to  determine  if  they  established  a  candidate  or 
not. 

Pseudo-code  for  the  server-side  function  s_rcw .setup  is  listed  in  Figure  5.  This  function  sets 
up  the  candidate  for  a  server  to  accept  based  on  the  conditioned-on  object  history  set  and  data 
sent  by  the  client. 
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s_rcw_initialize()  : 

1200:  s.ReplicaHistory  :=  {(0,  0,  _L)} 

1201:  Vs'  €  U,  s.a[s']  :=  hmac(s,  s' ,  s.ReplicaHistory) 

s_rcw_request(Z)fflia,  ObjectHistorySei)  : 

1300:  /*  Reply  to  read  requests.  */ 

1301:  if  {ObjectHistorySet  =  _L)  then 

1302:  reply(s,  SUCCESS,  s .{ReplicaHistory ,  a)) 

1303:  end  if 

1304:  /*  Validate  authenticators.  */ 

1305:  for  each  (s'  €  U)  do 

1306:  if  (hmac(s,  s',  ObjectHistorySet[s']. ReplicaHistory)  7^  ObjectHistorySet[s'].a[s])  then 

1307:  Ob jectHistorySet[s]. ReplicaHistory  {(0,  0,  _L)} 

1308:  end  if 

1309:  end  for 

1310:  /*  Setup  candidate  and  determine  operation  type.  */ 

1311:  {CWType,  Candidate,  LT cunent)  '■=  s_rcw_setnp(Z)ata,  ObjectHistorySet) 

1312:  /*  Determine  if  this  is  a  repeated  request.  */ 

1313:  if  (Candidate  £  s.ReplieaHistory)  then 

1314:  /*  Reply  with  success,  but  send  current  history.  */ 

1315:  reply(s,  SUCCESS,  s .{ReplicaHistory ,  a)) 

1316:  end  if 

1317:  /*  Validate  that  conditioned-on  object  history  set  is  current.  */ 

1318:  if  (msoi{s. ReplicaHistory .LT)  >  LTcunent)  then 
1319:  reply(s,  fail,  s .{ReplicaHistory ,  a}) 

1320:  end  if 
1321:  atomic 

1322:  s.ReplicaHistory  :=  s.ReplicaHistory  U  Candidate 

1323:  Vs'  £  U,  s.a[s']  :=  hmac(s,  s',  s.ReplicaHistory) 

1324:  if  {CWType  £  {write,  inline_write})  then 

1325:  /*  By  definition,  the  conditioned-on  candidate  is  established.  */ 

1326:  PrunedHistory  :=  {Candidate'  :  {Candidate'  £  s.ReplicaHistory)  A  {Candidate' .LT  <  Candidate. LTco)} 

1327:  s.ReplieaHistory  :=  s.ReplicaHistory  \  PrunedHistory 

1328:  end  if 

1329:  end  atomic 

1330:  reply(s,  SUCCESS,  s .{ReplicaHistory ,  a}) 

Figure  4.  Server  side  R/CW  pseudo-code  for  server  s. 
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s_rcw_setup(Z)ata,  ObjectHistorySet)  : 

1400:  {CWType,  LatestCandidate,  LatestBarrier)  :=  rc'w_classify (ObjectHistorySet) 

1401:  LT CO  ■—  LatestCandidate. LT 
1402:  if  (CWType  =  write)  then 

1403:  LT.Time  :=  rcw  Aa.testH,ime(ObjectHistorySet)  .Time  +  1  (=  LatestCandidate. LT  .Time  +  1) 

1404:  LT  .Barrier Flag  :=  FALSE 

1405:  LT .ClientID  :=  ClientID  /*  Client  ID  is  known  from  authenticated  channel.  */ 

1406:  LT.DataVerifier  :=  hash(Z3ata) 

1407:  LT  .OHS  Verifier  :=  hash.(ObjectHistorySet) 

1408:  LT  current  ■=  LatcstCandidatc.  LT  (=  LT  co) 

1409:  else  if  (CWType  —  barrier)  then 
1410:  Data  :=  _L 

1411:  LT.Time  :=  rcw  _\atest_time(ObjeetHistorySet) .  Time  +  1 

1412:  LT  .Barrier Flag  ~  true 

1413:  LT  .ClientID  :=  ClientID 

1414:  LT.DataVerifier  :=  _L 

1415:  LT  .OHS  Verifier  :=  hash.(ObjeetHistorySet) 

1416:  LT current  ~  LT 

1417:  else  if  (CWType  =  copy)  then 

1418:  Data  :=  LatestCandidate. Data 

1419:  LT.Time  :=  rcw  Aatest-time(ObjeetHistorySet) .  Time  +  1  (=  LatestBarrier.  LT.Time  +  1) 

1420:  LT .BarrierFlag  :=  false 

1421:  LT  .ClientID  :=  ClientID 

1422:  LT.DataVerifier  :=  hash(Z)ata) 

1423:  LT.OHSVerifier  :=  hash.(ObjeetHistorySet) 

1424:  LTcurrent  ■=  LatcstBarricr . LT 

1425:  else  if  (CWType  =  inline.write)  then 
1426:  (LT ,  LT  CO,  Data)  —  LatestCandidate 

1427:  LT  current  '.=  LT 

1428:  else 

1429:  /*  CWType  —  inline_barrier  */ 

1430:  (LT ,  LT  CO,  Data)  —  LatestBarrier 

1431:  LT  current  '.=  LT 

1432:  end  if 

1433:  return  ({CWType,  (LT ,  LT  co,  Data) ,  LT  current )  ) 


Figure  5.  Server-side  R/CW  logic  for  setting  up  the  candidate  to  accept. 
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4  Multi-object  operations 


In  this  section  we  discuss  how  to  extend  the  R/CW  protocol  so  that  conditional- write  operations 
may  condition  on  multiple  objects.  Multi-object  writes  provide  a  different  safety  guarantee  than 
single-object  writes:  strict  serializability  [3]  instead  of  lineariz ability  [5].  Linearizability  only  applies 
to  protocols  that  operate  on  individual  objects,  whereas  strict  serializability  guarantees  that  all 
writes  across  all  objects,  even  multi-object  writes,  are  partially  ordered  in  a  reasonable  manner. 

A  multi-object  conditional- write  atomically  writes  a  set  of  objects.  To  perform  such  a  conditional- 
write,  a  client  includes  a  conditioned-on  OHS  for  each  object  being  written.  The  set  of  objects 
and  each  object’s  corresponding  conditioned-on  OHS,  are  referred  to  as  the  multi-object  history  set 
{multi- OHS).  Each  server  locks  its  local  version  of  each  object  in  the  multi-OHS  and  validates  that 
each  conditioned-on  OHS  is  current.  The  local  locks  are  acquired  in  object  ID  order  to  avoid  the 
possibility  of  local  deadlocks.  Assuming  validation  passes  for  all  of  the  objects  in  the  multi-OHS, 
the  server  accepts  the  conditional-write  for  all  the  objects  in  the  multi-OHS  simultaneously. 

So  long  as  a  candidate  is  classified  as  complete  or  incomplete,  no  additional  logic  is  required. 
However,  to  repair  multi-object  conditional- writes,  clients  must  determine  which  objects  are  in  the 
multi-OHS.  As  such,  the  multi-OHS  is  included  in  the  timestamp.  Note  that  all  objects  written  by 
a  multi-object  conditional-write  have  the  same  multi-OHS  in  their  timestamp. 

To  illustrate  multi-object  repair,  consider  a  multi-object  conditional-write  that  writes  two  ob¬ 
jects,  Oa  and  Ob-  The  multi-object  conditional-write  results  in  a  candidate  for  each  object,  Ca  and 
Cb  respectively.  Now,  consider  a  client  that  queries  Oa  and  classifies  Ca  as  repairable.  To  repair  Cq, 
the  client  must  fetch  a  current  object  history  for  Ob  because  ot  is  in  the  multi-OHS  of  Ca-  If  Ca  is 
in  fact  established,  then  there  could  exist  a  subsequent  established  candidate  at  Ob  that  conditions 
on  Cb-  If  Ca  is  not  established,  then  subsequent  operations  at  Ob  may  preclude  Cb  from  ever  being 
established  (e.g.,  a  barrier  and  a  copy  that  establishes  another  candidate  at  Ob  with  a  higher  time- 
stamp  than  Cb).  The  former  requires  that  Ca  be  reclassified  as  complete.  The  latter  requires  that 
Ca  be  reclassified  as  incomplete. 

Such  reclassification  of  a  repairable  candidate,  based  on  the  objects  in  its  multi-OHS,  is  called 
classification  by  deduction.  If  the  repairable  candidate  lists  other  objects  in  its  multi-OHS,  then 
classification  by  deduction  must  be  performed.  If  classification  by  deduction  does  not  result  in 
reclassification,  then  repair  is  performed.  Repair,  like  in  the  case  of  individual  objects,  consists  of 
barrier  and  copy  operations.  The  multi-OHS  for  multi-object  barriers  and  multi-object  copies  have 
the  same  set  of  objects  in  them  as  the  multi-OHS  of  the  repairable  candidate.  Because  multi-object 
operations  are  atomic,  classification  by  deduction  cannot  yield  conflicting  reclassifications:  either  all 
of  the  objects  in  the  multi-OHS  are  classified  as  repairable,  some  are  classified  complete  (implying 
that  all  are  complete),  or  some  are  classified  incomplete  (implying  that  all  are  incomplete). 

5  Query/Update  (Q/U)  protocol 

The  Query/Update  (Q/U)  protocol  is  a  variant  of  the  R/CW  protocol;  it  is  described,  in  detail,  in 
the  main  paper  [1].  The  correctness  of  the  Q/U  protocol  is  based  on  the  correctness  of  the  R/CW 
protocol. 

This  section  presents  extended  pseudo-code  for  the  query/update  (Q/U)  protocol.  Given  the 
Q/U  protocol  pseudo-code  and  the  R/CW  protocol  pseudo-code,  it  is  clear  the  the  one  is  a  variant  of 
the  other.  The  Q/U  protocol  pseudo-code  is  somewhat  longer  and  more  detailed  than  the  R/CW 
protocol  pseudo-code  because  of  the  operations-based  interface  it  provides,  the  need  for  object 
syncing,  and  the  inclusion  of  some  performance  optimizations  in  the  pseudo-code. 


22 


structures,  types,  &  enumerations: 

1500 

/*  Enumerations.  */ 

1501 

Class  G  {query,  update} 

1502 

Type  G  {method,  inline_method,  copy,  barrier,  inline_barrier} 

1503 

/*  Structures.  */ 

1504 

Operation  =  { 

1505 

Method 

/*  Method  to  invoke  on  the  object.  */ 

1506 

Class 

/ *  Class  of  operation.  * / 

1507 

Argument 

/*  Argument(s)  passed  into  method.  */ 

1508 

} 

1509 

LT  =  { 

/*  Logical  timestamp.  */ 

1510 

Time 

/*  Major  component  of  logical  time.  */ 

1511 

BarrierFlag 

/*  TRUE  for  barriers.  */ 

1512 

ClientlD 

/*  Client  ID.  */ 

1513 

Operation 

/*  Operation  to  be  performed  on  conditioned-on  object  version.  */ 

1514 

ObjectHistorySet 

/*  Conditioned-on  ObjectHistorySet.  */ 

1515 

}  /*  Operation  and  ObjectHistorySet  replaced  with  single  hash  in  the  implementation.  */ 

1516 

Candidate  =  { 

/*  Candidates  are  initialized  to  (0,  0).  */ 

1517 

LT 

/*  Timestamp  of  corresponding  object  version.  */ 

1518 

LT  CO 

/*  Timestamp  of  conditioned-on  object  version.  */ 

1519 

} 

1520 

/*  Types.  */ 

1521 

ReplicaHistory  =  {Candidate} 

/*  An  ordered  set  of  candidates.  */ 

1522 

a  =  HMAC[f/]  /*  An  authenticator 

is  an  array  of  HMACs  indexed  by  server  {U  is  the  universe  of  servers).  */ 

1523 

ObjectHistorySet  =  {ReplicaHistory, 

«)[[/]  /*  An  array  of  replica  histories  indexed  by  server.  */ 

Figure  6.  Enumerations,  structures,  and  data  types  used  in  pseudo-code. 


In  this  section  we  also  present  a  longer  example  execution  than  in  the  main  paper.  Moreover,  the 
example  execution  is  for  a  queue  object  that  requires  the  semantics  provided  by  the  Q/U  protocol. 

5.1  Data  structures 

Symbols  used  in  the  pseudo-code  are  listed  in  Table  1  in  §3.  Enumerations,  structures,  and  types 
used  in  the  pseudo-code  are  given  in  Figure  6. 
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5.2  Client-side 


Pseudo-code  for  client-side  functions  is  give  in  Figure  7.  The  pseudo-code  for  the  query  c_qu_fetch 
includes  the  optimization  to  handle  servers  executing  queries  optimistically  if  the  clients  object 
history  set  is  not  current.  It  also  includes  the  optimization  (although  not  complete  pseudo-code) 
for  returning  answers  from  the  latest  complete  object  version,  even  if  there  is  a  later  incomplete 
candidate.  This  optimization  avoids  queries  contending  with  a  single  client  performing  updates. 

Pseudo-code  for  the  function  c_qu .quorum _r pc  is  given  in  Figure  8.  Like  in  the  R/CW 
pseudo-code,  a  rudimentary  quorum  probing  policy  is  shown.  Note  c_qu .quorum _r pc  differs  from 
c_rcw_quorum_rpc  in  some  substantive  ways.  The  voting  and  synthesis  of  responses  performed 
on  lines  2017  to  2020  is  to  ensure  that  the  answer  returned  matches  that  of  a  benign  server.  (A 
Byzantine  faulty  server  could  respond  with  success  but  supply  the  incorrect  answer.) 

Figure  9  lists  pseudo-code  for  the  function  qu.classify.  The  construction  of  candidates  for 
inline  repair  of  value  candidates  and  barrier  candidates  is  included  in  this  extended  pseudo-code. 
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c_qu_initialize():  /*  Client  initialization.  */ 

1600:  for  each  (s  £  U)  do 

1601:  ObjectHistorySet[s].ReplicaHistory  :=  {{0,  0)} 

1602:  ObjectHistorySet[s].a  :=  _L 

1603:  end  for 

C-quAncrement{Argument):  /*  Example  update  operation.  */ 

1700:  Operation  :=  (inc,  update,  Argument) 

1701:  {Answer,  Order,  ObjectHistorySet)  :=  c_qu_quorum_rpc( Operation,  ObjectHistorySet) 

1702:  while  {Order  <  q)  do 

1703:  ObjectHistorySet  :=  C-qujcepair{ObjectHistorySet) 

1704:  {Answer,  Order,  ObjectHistorySet)  :=  c_qu_quorum_rpc( Operation,  ObjectHistorySet) 

1705:  end  while 
1706:  return  {{Answer)) 

c_qu_fetch():  /*  Example  query  operation.  */ 

1800:  Operation  :=  (fetch,  query,  _L) 

1801:  {Answer,  Order,  ObjectHistorySet)  :=  c_qu_quorum_rpc( Operation,  ObjectHistorySet) 

1802:  while  {Order  <  q)  do 

1803:  /*  See  if  query  was  executed  optimistically.  */ 

1804:  {Type,  H,  A)  :=  qu_classify(06jiectffistorpS'et) 

1805:  if  {Type  =  method)  then 

1806:  return  {{Answer)) 

1807:  end  if 

1808:  /*  Try  and  avoid  update-query  contention  */ 

1809:  if  {Order  <  r)  then 

1810:  /*  Eliding  details  of  (i)  Determining  if  the  latest  two  entries  in  ObjectHistorySet  are  the  incomplete  */ 

1811:  /*  candidate  and  the  established  candidate  it  is  conditioned  on,  and  (ii)  determing  the  answer  */ 

1812:  /*  Answer'  returned  by  the  conditioned-on  candidate.  */ 

1813:  return  {{Answer')) 

1814:  end  if 

1815:  ObjectHistorySet  :=  cj\\\jeepa.ir{ObjectHistorySet) 

1816:  {Answer,  Order,  ObjectHistorySet)  :=  c_qu_quorum_rpc(  Operation,  ObjectHistorySet) 

1817:  end  while 
1818:  return  {{Answer)) 

c_qu_repair{ObjectHistorySet):  /*  Deal  with  failures  and  contention.  */ 

1900:  (rppe,_L,_L)  qu_classify{ObjectHistorySet) 

1901:  while  {Type  7^  method)  do 

1902:  backoff))  /*  Backoff  to  avoid  livelock.  */ 

1903:  /*  Perform  a  barrier  or  copy  (depends  on  ObjectHistorySet) .  */ 

1904:  (-L,-L,  ObjectHistorySet)  :=  c_qu_quorum_rpc(_L,  ObjectHistorySet) 

1905:  ( Ti/pe,  _L,  _L)  :=  q\i_classiiy{ObjectHistorySet) 

1906:  end  while 

1907:  return  {ObjectHistorySet) 


Figure  7.  Client  side  pseudo-code. 
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c_qu_quorum_rpc( Operaiion,  ObjectHistorySet)  :  /*  Quorum  RPC  from  client  to  servers.  */ 

2000:  ResponseSet  SuccessSet  :=  0 
2001:  repeat 

2002:  /*  Eliding  probing  policy.  For  simplicity  broadcast  to  all  servers.  */ 

2003:  for  each  {s  G  U  \  ResponseSet. s)  do 

2004:  send(s,  Operation,  ObjectHistorySet) 

2005:  end  for 

2006:  if  (poll()  =  true)  then 

2007:  {s ,  Status ,  Answer ,  {ReplicaHistory ,  a})  :=  receive() 

2008:  if  (s  ^  ResponseSet. s)  then 

2009:  ObjectHistorySet[s]  :=  {ReplicaHistory ,  a) 

2010:  ResponseSet  :=  ResponseSet  U  (s) 

2011:  end  if 

2012:  if  {Status  =  SUCCESS)  then 

2013:  SuecessSet  :=  SuccessSet  U  {Answer,  {ReplicaHistory,  a)) 

2014:  end  if 

2015:  end  if 

2016:  until  {3Q  C  ResponseSet  :  Q  €  Q) 

2017:  /*  Use  voting  to  identify  response  from  benign  server.  */ 

2018:  VoteCount  max(V Response  €  ResponseSet  :  qn_connt(Response,  SuccessSet)) 

2019:  Response  ~  {Response  :  qu_count(Response,  SuccessSet)  =  VoteCount) 

2020:  retnrn  {{Response.Answer,(\vt._cou\it{Response,  SuccessSet),  ObjectHistorySet}) 


Figure  8.  Quorum  RPC  pseudo-code. 


qu_classify  ( ObjectHistorySet) : 

2100:  /*  Get  latest  object  version,  barrier  version,  and  timestamp.  */ 

2101:  LatestObjectVersion  :=  qu_latest_candidate(0&j;ecti7istorp5'et,  false) 

2102:  LatestBarrierVersion  qu_latest_candidate(06jrecti/istor?/Set,  true) 

2103:  iTiatest  :=  qu_latest_time(0&jectffistorj;5'et) 

2104:  /*  Determine  which  type  of  operation  to  perform.  */ 

2105:  if  (Z/Tiatest  =  LatestObjectVersion. LT)  A  (qu_order(iatestO?yect Version,  ObjectHistorySet)  >  q)  then 
2106:  Type  :=  method 

2107:  else  if  (RTiatest  =  LatestObjectVersion. LT)  A  (qu_order(iaiesi06ject  Version,  ObjectHistorySet)  >  r)  then 
2108:  Type  inline_method 

2109:  else  if  (Z/Tiatest  =  LatestBarrierVersion. LT) A{qu-order{LatestBarrierVersion,  ObjectHistorySet)  >  q)  then 
2110:  Type  :=  COPY 

2111:  else  if  (Z/Tiatest  =  LatestBarrierVersion. LT) A{qu-order{LatestBarrierVersion,  ObjectHistorySet)  >  r)  then 
2112:  Type  :=  inline_barrier 

2113:  else 

2114:  Type  :=  barrier 

2115:  end  if 

2116:  retnrn  {{Type,  LatestObjectVersion,  LatestBarrierVersion)) 
qu_latest_candidate( ObjectHistorySet,  BarrierFlag) 

2200:  CandidateSet  :=  {Candidate  :  (qu_order( Candidate,  ObjectHistorySet)  >  r)A 
2201:  {Candidate.LT  .BarrierFlag  =  BarrierFlag)} 

2202:  Candidate  :=  {Candidate  :  {Candidate  €  CandidateSet)  A  {Candidate.LT  —  msoc{CandidateSet.LT))) 

2203:  retnrn  {Candidate) 

qu_order(  Candidate,  ObjectHistorySet)  : 

2300:  retnrn  (|{s  £  U  :  Candidate  £  Ob jectHistorySet[s]. ReplicaHistory}}) 

(\uAa.test_tune{ObjectHistorySet) 

2400:  retnrn  {ma:K{ObjectHistorySet[U].Repli(xiHistory.LT)) 


Figure  9.  Classification  of  an  object  history  set. 


26 


5.3  Server-side 

Figure  10  lists  pseudo-code  for  server-side  functions.  The  logic  for  inline  repair  of  value  candidates 
and  barrier  candidates  is  included.  Optimistic  execution  of  queries  is  shown.  The  logic  necessary 
to  handle  repeated  requests  is  given  (i.e.,  checking  for  the  candidate  in  the  replica  history,  and 
storing/retrieving  answers  in  addition  to  object  versions).  Pruning  replica  histories  (and  garbage 
collecting  object  versions  and  answers)  is  shown. 

Figure  11  lists  pseudo-code  for  the  server-side  function  s_qu_setup.  The  logic  for  constructing 
candidates  for  inline  repairs  of  value  candidates  and  barrier  candidates  is  included. 

Figure  12  lists  pseudo-code  for  object  sync.  Like  c_qu_quorum_rpc,  s_qu_object_sync  broad¬ 
casts  its  requests  which  is  inefficient.  In  the  implementation,  servers  are  probed  based  on  the 
information  in  the  conditioned-on  object  history  set. 
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s_qu_initialize()  ;  /*  Initialize  server  s.  */ 

2500:  s.ReplicaHistory  :=  {{0,  0)} 

2501:  Vs'  €  U,  s.a[s']  :=  hmac(s,  s' ,  s.ReplicaHistory) 

s_qu_request( Operation,  ObjectHistorySet)  :  /*  Handle  request  at  server  s.  */ 

2600:  Answer  :=  _L  /*  barrier  and  inline_barrier  return  null  answers.  */ 

2601:  /*  Validate  authenticators.  */ 

2602:  for  each  (s'  €  U)  do 

2603:  if  (hmac(s,  s',  ObjectHistorySet[s'].ReplicaHistory)  ^  ObjectHistorySet[s'].a[s])  then 

2604:  ObjectHistorySet[s'].ReplieaHistory  ~  {(0,  0)}  /*  Cull  invalid  ReplicaHistorys.  */ 

2605:  end  if 

2606:  end  for 

2607:  /*  Setup  candidate  and  determine  operation  type.  */ 

2608:  {Type,  {LT,  LTco),  LTcunent)  ■—  s_qu_setup( Operation,  ObjectHistorySet) 

2609:  /*  Determine  if  this  is  a  repeated  request.  */ 

2610:  if  {{LT,LTco)  £  s.ReplicaHistory)  then 
2611:  (_L,  dnswer)  :=  retrieve(Z/r) 

2612:  reply{s,  SUCCESS,  Answer,  s.{ReplicaHistory,  a})  /*  Reply  with  success,  but  send  current  history.  */ 

2613:  end  if 

2614:  /*  Validate  that  conditioned-on  object  history  set  is  current.  */ 

2615:  if  (quAatest _time{s.ReplicaHistory)  >  LTcunent)  then 

2616:  /*  Optimistically  execute  query  operations  on  latest  object  version.  */ 

2617:  if  (Operation.  Class  =  query)  then 

2618:  Object  :=  retrieve(qu_latest_time(s.i7ep/icoi7isior?;))  /*  Retrieve  latest  local  object  version.  */ 

2619:  (_L,  ^nsuier)  :=  Operation. Method{Object,  Operation. Argument) 

2620:  end  if 

2621:  /*  Answer  =  _L  except  in  the  case  of  optimistic  query  execution.  */ 

2622:  reply(s,  fail.  Answer,  s .{ReplicaHistory ,  a)) 

2623:  end  if 

2624:  if  {Type  £  {method,  inline_method,  copy})  then 

2625:  /*  Retrieve  conditioned-on  object  so  that  method  can  be  invoked.  */ 

2626:  {Object,  Answer)  :=  retrieve(Lrco)  /*  Attempt  to  retrieve  it  locally.  */ 

2627:  if  {{Object  =  _L)  A  {LTco  >  0))  then 

2628:  {Object,  Answer)  :=  s_qu_object_sync(LT'co)  /*  Retrieve  from  other  servers  via  object  sync.  */ 

2629:  end  if 

2630:  end  if 

2631:  if  {  Type  G  {method,  inline_method})  then 

2632:  {Object,  Answer)  :=  Operation. Method{Object,  Operation. Argument) 

2633:  if  ( Operation.  C/ass  =  query)  then 

2634:  reply(s,  SUCCESS,  Answer,  s .{ReplicaHistory ,  a)) 

2635:  end  if 

2636:  end  if 
2637:  atomic 

2638:  s.ReplicaHistory  :=  s.ReplicaHistory  U  {{LT,LTco)}  /*  Update  replica  history.  */ 

2639:  ys' £  U,  s.a[s']  :=  hm.ac{s,  s' ,  s.ReplicaHistory)  /*  Update  authenticator.  */ 

2640:  if  {Type  £  {method,  inline_method,  copy})  then 

2641:  store(ir,  {Object,  Answer))  /*  Store  object  version  locally  indexed  by  logical  timestamp.  */ 

2642:  end  if 

2643:  if  (Type  £  {method, inline_method})  then 

2644:  /*  By  definition,  the  conditioned-on  candidate  is  established.  */ 

2645:  PrunedHistory  :=  {Candidate  :  {Candidate  £  s.ReplicaHistory)  A  {Candidate. LT  <  LT co)} 

2646:  s.ReplieaHistory  :=  s.ReplicaHistory  \  PrunedHistory 

2647:  rem.o've_garhage{PrunedHistory.LT)  /*  Remove  local  object  versions  of  pruned  candidates.  */ 

2648:  end  if 

2649:  end  atomic 

2650:  reply(s,  SUCCESS,  Answer,  s .{ReplicaHistory ,  a)) 


Figure  10.  Server  side  pseudo-code. 
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s_qu_setup(  Operation,  ObjectHistorySei)  : 

2700:  {Type,  LatestObjectVersion,  LatestBarrierVersion)  :=  qMjclassiiy {ObjectHistorySei) 

2701:  LT CO  ■=  LatestObjectVersion. LT 
2702:  if  {Type  =  method)  then 

2703:  LT  .Time  :=  qu_latest_time(06jecti7istorp5'et).  Time  +  1 

2704:  LT .  Barrier  Flag  :=  FALSE 

2705:  LT.ClientID  :=  ClientID  /*  Client  ID  is  known  from  authenticated  channel.  */ 

2706:  LT  .Operation  ~  Operation 

2707:  LT  .ObjectHistorySei  :=  ObjectHistorySei 

2708:  iTcurrent  :=  LatestObjectVcrsion . LT  (=  LTco) 

2709:  else  if  {Type  =  barrier)  then 

2710:  LT.Time  :=  qu_latest_time(06jecti7istorpS'et). Time  +  1 

2711:  LT.BarrierFlag  :=  true 

2712:  LT.ClientID  :=  ClientID 

2713:  LT  .Operation  :=  _L 

2714:  LT  .ObjectHistorySei  :=  ObjectHistorySei 

2715:  LT  current  :=  LT 

2716:  else  if  {Type  =  copy)  then 

2717:  LT.Time  :=  quAatest  _time{ObjectHistorySet) .  Time  +  1 

2718:  LT .BarrierFlag  :=  FALSE 

2719:  LT  .ClientID  :=  ClientID 

2720:  LT.  Operation  :=  LT co ■  Operation 

2721:  LT  .ObjectHistorySei  :=  ObjectHistorySei 

2722:  LTcurrent  ~  LatcstBarrierVcrsion .LT 

2723:  else  if  {Type  =  inline_method)  then 

2724:  LT  :=  LatestObjectVersion. LT 

2725:  LTco  ■=  LatestObjectVersion. LTco 

2726:  T^rcurrent  LT 

2727:  else 

2728:  /*  Tppe  =  inline_barrier  */ 

2729:  LT  :=  LatestBarrierVersion. LT 

2730:  LTco  ■—  LatestBarrierVersion. LTco 

2731:  LT  current  LT 

2732:  end  if 

2733:  return  {{Type,  {LT ,  LTco) ,  LTcurrent)) 

Figure  11.  Server  side  pseudo-code  to  set  up  candidate  and  identify  type  of  operation. 
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s_qu  .object  _sync(ir) 

2800:  /*  Pseudo-code  does  not  deal  with  servers  pruning  replica  histories  or  garbage  collecting  object  versions.  */ 
2801:  ResponseSet  ~  0 
2802:  repeat 

2803:  /*  Eliding  details  of  probing  based  on  object  history  set.  */ 

2804:  for  each  (s  €  U)  do 

2805:  send(s,Lr) 

2806:  end  for 

2807:  if  (poll()  =  true)  then 

2808:  {Object,  Answer)  :=  receive() 

2809:  ResponseSet  :=  ResponseSet  U  {{Object,  Answer)} 

2810:  if  {qu-count {{Object,  Answer) ,  ResponseSet)  >6+1)  then 

2811:  return  ({Object,  Answer)) 

2812:  end  if 

2813:  end  if 

2814:  until  (false) 

s_qu_object_source(LT') 

2900:  {Object,  Answer)  :=  c_qu_fetch(Lr) 

2901:  if  (Object  7^  +)  then 
2902:  reply(s,  (06jiecf,  dnsuier)) 

2903:  end  if 

qu_count(Element,  Set)  : 

3000:  return  ({{Element  :  Element  £  -S'et}|) 


Figure  12.  Pseudo-code  for  object  syncing  at  server  s. 
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5.4  Example  Q/U  protocol  execution 

An  example  execution  of  the  Q/U  protocol  is  given  in  Table  2  for  a  queue  object.  The  caption 
explains  the  structure  of,  and  notation  used  in,  the  table.  The  example  is  for  an  object  that  exports 
three  methods:  enq  (an  update  that  enqueues  a  value),  deq  (an  update  that  dequeues  a  value), 
and  front  (a  query  that  returns  the  value  at  the  front  of  the  queue).  The  server  configuration  is 
based  on  the  smallest  quorum  system  for  6  =  1.  Clients  perform  optimistic  queries  for  the  front 
method;  the  conditioned-on  OHS  sent  with  the  enq  and  deq  methods  is  not  shown  in  the  table. 
The  sequence  of  client  operations  is  divided  into  six  sequences  of  interest  by  horizontal  double 
lines:  the  initial  queue  state,  four  illustrative  sequences  of  operations,  and  the  final  queue  state. 
For  illustrative  purposes,  clients  X  and  Z  interact  with  the  object’s  preferred  quorum  (the  first  five 
servers)  and  Y  a  non-preferred  (the  last  five  servers). 

In  the  first  sequence,  the  queue  state  is  initialized  to  the  well  known  null  value  at  the  well  known 
logical  time  zero.  The  second  sequence  demonstrates  failure-  and  concurrency-free  execution:  client 
X  performs  a  front  and  enq  that  each  complete  in  a  single  phase.  Client  Y  performs  a  front  in  the 
second  sequence  that  requires  repair.  Since  there  is  no  contention,  Y  performs  inline  repair.  Server 
S5  performs  object  syncing  to  process  the  inline  repair  request.  It  also  optimistically  performs  the 
front  after  object  syncing  and  returns  to  the  client. 

In  the  fourth  sequence,  concurrent  updates  by  X  and  Y  both  abort  because  of  contention; 
however,  X  establishes  a  candidate.  It  is  possible  for  an  update  to  abort,  but  establish  a  candidate 
because  the  established  threshold  is  b  less  than  quorum  size  (4  in  this  case).  At  server  54,  the 
enq  from  Y  arrives  before  the  enq  from  X.  As  such,  it  returns  its  replica  history  {(3,1),  (1,0)} 
with  FAIL.  The  candidate  (3, 1)  in  this  replica  history  dictates  that  the  timestamp  of  the  barrier 
be  4b.  For  illustrative  purposes,  Y  backs  off.  Client  X  subsequently  completes  barrier  and  copy 
operations. 

Notice  though  that  the  replica  histories  returned  by  the  servers  in  response  to  T’s  enq  requests 
are  pruned.  Servers  prune  their  replica  histories  whenever  they  accept  an  update  operation  (or 
an  inline  repair),  since  such  operations  are  conditioned  on  established  candidates.  Servers  do  not 
prune  their  replica  histories  when  they  accept  copy  and  barrier  operations,  since  such  operations 
may  condition  on  potential  candidates.  For  example,  server  si  returns  {(2, 1),  (1,0)},  rather  than 
{(2,1), (1,0), (0,0)},  because  the  object  history  set  sent  by  X  with  enq  operation  (2,1)  proved 
that  (1, 0)  was  established. 

In  the  fifth  sequence,  X  crashes  during  an  enq  operation  and  yields  a  potential  candidate. 
The  remaining  clients  Y  and  Z  perform  concurrent  front  operations  at  different  quorums;  this 
illustrates  how  a  potential  candidate  can  be  classified  as  either  repairable  or  incomplete.  Y  and 
Z  concurrently  attempt  a  barrier  and  inline  repair  respectively.  In  this  example,  Y  establishes  a 
barrier  before  Z's  inline  repair  requests  arrive  at  servers  S3  and  S4.  As  such,  client  Z  aborts  its 
inline  repair  operation  and  backs  off.  Subsequently,  Y  completes  a  copy  operation  and  establishes 
a  candidate  (8,5)  that  copies  forward  the  established  candidate  (5,2).  Then,  Y  dequeues  a  value 
from  the  queue.  After  backing  off,  Z  attempts  a  front  operation  that  requires  an  inline  repair  to 
complete.  Notice  that  queries  such  as  front  can  be  piggy-backed  on  inline  repairs.  Finally,  in  the 
sixth  sequence,  the  final  server  state  is  listed. 
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Operation 

So 

Si 

S2 

S3 

S4 

^5 

Result 

Initial  queue  state 

{0},T 

{0},± 

{0},± 

{0},± 

{0},± 

{o},± 

X  completes  front () 

{0},T 

{0},± 

{0},± 

{0},± 

{0},± 

0  complete,  return  ± 

X  completes  enq(J|k) 

(1,0),  T,[*] 

(1,0),  ±,[*] 

(1,0),±,[*] 

(1,0),  ±,[*] 

(1,0),  ±,[*] 

1  established 

Y  begins  front()... 

{1,0},* 

{1,0},* 

{1,0},* 

{1,0},* 

{0},± 

1  repairable 

...Y  performs  inline() 

(1,0),*,  [*] 

1  complete,  return  * 

X  attempts  enq(O)... 

Y  attempts  enq(‘v’) 

(2,1),T,[*0] 

(2,1),±,[*0] 
{2,  1},FAIL 

(2,1),±,[*0] 
{2,  1},  FAIL 

(2,1),±,[*0] 
{2,  1},FAIL 

{3,  1},  FAIL 
(3,1),±,[*T] 

(3,1),±,[*T] 

2  potential 

Y  backs  off 

...X  completes  barrier() 

(4b,  2),  T 

(4b,  2),  ± 

(4b,  2),  ± 

(4b,  2),  ± 

(4b,  2),  ± 

4b  established 

...X  completes  copy() 

(5,2),T,[*0] 

(5,2),±,[*0] 

(5,2),±,[*0] 

(5,2),±,[*0] 

(5,2),±,[*0] 

5  established 

X  crashes  in  enq(4|k) 

(6,  5),  ±,[*04] 

(6,5),±,[*0*] 

(6,5),±,[*0*] 

6  potential 

Y  begins  front()... 

Z  begins  front ()... 

{6,5},* 

{6,5},* 

{6,5},* 

{6,5},* 

{6,5},* 

{5, 4b,  2,1},* 
{5, 4b,  2,1},* 

{5, 4b,  2,1},* 
{5,4b,2,l},* 

{3,1},* 

6  incom.,  5  repair. 

6  repairable 

...Y  completes  barrier() 
...Z  attempts  inline() 

(7b,  5),  ± 

(7b,  5),  ± 

(7b,  5),  ± 
{7b,  ...,  1},  FAIL 

(7b,  5),  ± 

{7b,  ...,  1},FAIL 

(7b,  5),  ± 

7b  established 

Z  backs  off 

...Y  completes  copy() 

(8,5),*,  [*0] 

(8,5),*,[*0] 

(8,5),*,  [*0] 

(8,5),*,  [*0] 

(8,5),*,  [*0] 

8  established,  ret.  * 

Y  completes  deq() 

(9,8),*,  [0] 

(9,8),*,  [0] 

(9,8),*,[0] 

(9,8),*,  [0] 

(9,8),*,  [0] 

9  complete,  return  * 

Z  begins  front()... 

{6,5},* 

{9,8},0 

{9,8},0 

{9,8},0 

{9,8},0 

9  repairable 

...Z  completes  inline () 

(9,8),0,[0] 

9  complete,  return  0 

Final  queue  state 

{9,8},[0] 

{9,8},[0] 

{9,8},[0] 

{9,8},[0] 

{9,8},[0] 

{9,8},[0] 

Table  2.  Example  Q/U  protocol  execution  for  a  queue  object  that  supports  enq  (update),  deq  (update),  and  front  (query) 
operations.  Operations  performed  by  three  clients  {X,  Y,  and  Z)  are  listed  in  the  left  column.  The  middle  columns  lists  candidates 
stored  by  and  replies  (replica  histories  or  status  codes)  returned  by  six  benign  servers  (sq;  •••;  ss)-  For  enq,  deq,  copy,  and  inline 
operations,  the  triple  listed  is  the  timestamp/conditioned-on  timestamp  pair,  answer  from  the  server,  and  queue  state.  If  an 
operation  is  rejected,  because  the  conditioned-on  OHS  is  not  current,  then  the  server  reply  is  listed:  the  server’s  replica  history  and 
FAIL.  Excepting  queue  state,  the  listing  for  barrier  operations  is  the  same  as  the  other  update  operations.  Eor  front  operations, 
a  query  operation,  the  replica  history  and  answer  returned  by  the  server  are  listed.  The  results  listed  in  the  right  column  that 
identify  candidates  as  established  or  potential  are  based  on  the  servers  being  benign.  Time  “flows”  from  the  top  row  to  the  bottom 
row.  Candidates  are  denoted  {LT^LTqq).  Only  LT .Time  with  “b”  appended  for  barriers  is  shown  for  timestamps  (i.e.,  client  ID, 
operation,  and  object  history  set  are  not  shown).  Replica  historires  are  denoted  {LT3,  LT2, ...}.  Only  the  candidate’s  LT,  not  the 
LT  CO  1  is  listed  in  the  replica  history.  Queue  state  is  listed,  in  order,  delimited  by  square  brackets  “[”  and  “]”. 
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